The Making of the Royal Society Corpus

نویسندگان

  • Jörg Knappen
  • Stefan Fischer
  • Hannah Kermes
  • Elke Teich
  • Peter Fankhauser
چکیده

The Royal Society Corpus is a corpus of Early and Late modern English built in an agile process covering publications of the Royal Society of London from 1665 to 1869 (Kermes et al., 2016) with a size of approximately 30 million words. In this paper we will provide details on two aspects of the building process namely the mining of patterns for OCR correction and the improvement and evaluation of partof-speech tagging.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Investigation of the Generic Features of Research Articles Published in the Bulletin of Iranian Mathematical Society

In light of the understanding that the analysis of the generic features of different academic genres can enhance the ability of non-native members of academic discourse communities to understand, and where appropriate, to produce them, the present study aimed at investigating the dominant generic structure of research articles in mathematics. To start with a relatively narrow focus, a corpus of...

متن کامل

Painting and Society The Formation of the Persian Painting in the 14th Century

Persian painting has usually been studied from historical point of views. But its formation is rooted in a specific social context. In this study, we will try to contextualize it and we will show that this social context has a crucial role regarding its aesthetic. Persian painting is an art of royal courts and it represents the life of princes combined with Persian epic legendes. This social co...

متن کامل

The Royal Society Corpus: From Uncharted Data to Corpus

We present the Royal Society Corpus (RSC) built from the Philosophical Transactions and Proceedings of the Royal Society of London. At present, the corpus contains articles from the first two centuries of the journal (1665–1869) and amounts to around 35 million tokens. The motivation for building the RSC is to investigate the diachronic linguistic development of scientific English. Specifically...

متن کامل

Treatment of hypospadias.

Release of the corpus is not so much obtained by the resection of tissue (chordee) as by dissection of the corpora cavernosa. It is important that this dissection should be completed. Too often, I have operated on patients with a so-called recurrence of chordee. My personal view is that no such process exists. A corpus cavernosum that is well released and has received an adequate skin cover sho...

متن کامل

Developing a Corpus-Based Word List in Pharmacy Research ‎Articles: A Focus on Academic Culture

The present corpus-based lexical study reports the development of a Pharmacy Academic Word List (PAWL); a list of the most frequent words from a corpus of 3,458,445 tokens made up of 800 most recent pharmacy texts including research articles, review articles, and short communications in four sub-disciplines of pharmacy. WordSmith (Scott, 2017) and AntWordProfiler (Anthony, 2014) were used to sc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017